Programming WCF Services : Queued Services - Delivery Failures (part 1) - Configuring the Dead-Letter Queue

9/7/2011 11:26:00 AM

A connected call may fail due to either communication failures or service-side errors. Similarly, a queued call can fail due to delivery failures or service-side playback errors. WCF provides dedicated error-handling mechanisms for both types of errors, and understanding them and integrating your error-handling logic with them is an intrinsic part of using queued services.

While MSMQ can guarantee delivery of a message if it is technically possible to do so, there are multiple examples of when it is not possible to deliver the message. These include but are not limited to:

Timeouts and expiration: As you will see shortly, each message has a timestamp, and the message has to be delivered and processed within the configured timeout. Failure to do so will cause the delivery to fail.
Security mismatches: If the security credentials in the message (or the chosen authentication mechanism itself) do not match up with what the service expects, the service will reject the message.
Transactional mismatches: The client cannot use a local nontransactional queue while posting a message to a transactional service-side queue.
Network problems: If the underlying network fails or is simply unreliable, the message may never reach the service.
Machine crashes: The service machine may crash due to software or hardware failures and will not be able to accept the message to its queue.
Purges: Even if the message is delivered successfully, the administrator (or any application, programmatically) can purge messages out of the queue and avoid having the service process them.
Quota breaches: Each queue has a quota controlling the maximum amount of data it can hold. If the quota is exceeded, future messages are rejected.

After every delivery failure, the message goes back to the client’s queue, where MSMQ will continuously retry to deliver it. While in some cases, such as intermittent network failures or quota issues, the retries may eventually succeed, there are many cases where MSMQ will never succeed in delivering the message. In practical terms, a large enough number of retry attempts may be unacceptable and may create a dangerous amount of thrashing. Delivery-failure handling deals with how to let MSMQ know that it should not retry forever, how many attempts it should make before giving up, how much time can elapse before it gives up, and what it should do with the failed messages.

MsmqBindingBase offers a number of properties governing handling of delivery failures:

public abstract class MsmqBindingBase : Binding,...
{
   public TimeSpan TimeToLive
   {get;set;}

   //DLQ settings
   public Uri CustomDeadLetterQueue
   {get;set;}
   public DeadLetterQueue DeadLetterQueue
   {get;set;}

   //More members
}

1. The Dead-Letter Queue

In messaging systems, after an evident failure to deliver a message, that message goes to a special queue called the dead-letter queue (DLQ). The DLQ is somewhat analogous to a classic dead-letter mailbox at the main post office. In the context of this discussion, failure to deliver constitutes not only failure to reach the service-side queue, but also failure to commit the playback transaction. MSMQ on the client and on the service side constantly acknowledge to each other receipt and processing of messages. If the service-side MSMQ successfully receives and retrieves the message from the service-side queue (that is, if the playback transaction committed), it sends a positive acknowledgment (ACK) to the client-side MSMQ. The service-side MSMQ can also send a negative acknowledgment (NACK) to the client. When the client-side MSMQ receives a NACK, it posts the message to the DLQ. If the client-side MSMQ receives neither an ACK nor a NACK, the message is considered in-doubt.

With MSMQ 3.0 (that is, on Windows XP and Windows Server 2003), the dead-letter queue is a system-wide queue. All failed messages from any application go to this single repository. With MSMQ 4.0 (that is, on Windows Vista, Windows Server 2008, and Windows 7 or later), you can configure a service-specific DLQ where only messages destined to that specific service go. Application-specific dead-letter queues grossly simplify both the administrator’s and the developer’s work.

Note:

When dealing with a nondurable queue, failed nontransactional messages go to a special system-wide DLQ.

2. Time to Live

With MSMQ, each message carries a timestamp initialized when the message is first posted to the client-side queue. In addition, every queued WCF message has a timeout, controlled by the TimeToLive property of MsmqBindingBase. After posting a message to the client-side queue, WCF mandates that the message must be delivered and processed within the configured timeout. Note that successful delivery to the service-side queue is not good enough—the call must be processed as well. The TimeToLive property is therefore somewhat analogous to the SendTimeout property of the connected bindings.

The TimeToLive property is relevant only to the posting client; it has no effect on the service side, nor can the service change it. TimeToLive defaults to one day. After continuously trying and failing to deliver (and process) a message for as long as TimeToLive allows, MSMQ stops trying and moves the message to the configured DLQ.

You can configure the time-to-live value either programmatically or administratively. For example, using a config file, here is how to configure a time to live of five minutes:

<bindings>
   <netMsmqBinding>
      <binding name = "ShortTimeout"
         timeToLive = "00:05:00"
      />
   </netMsmqBinding>
</bindings>

The main motivation for configuring a short timeout is when dealing with time-sensitive calls that must be processed in a timely manner. However, time-sensitive queued calls go against the grain of disconnected queued calls in general: the more time-sensitive the calls are, the more questionable the use of queued services is in the first place. The correct way of viewing time to live is as a last-resort heuristic used to eventually bring to the attention of the administrator the fact that the message was not delivered, not as a way to enforce business-level interpretation of the message’s sensitivity.

3. Configuring the Dead-Letter Queue

MsmqBindingBase offers the DeadLetterQueue property, of the enum type DeadLetterQueue:

public enum DeadLetterQueue
{
   None,
   System,
   Custom
}

When DeadLetterQueue is set to DeadLetterQueue.None, WCF makes no use of a dead-letter queue. After a failure to deliver, WCF silently discards the message as if the call never happened. DeadLetterQueue.System is the default value of the property. As its name implies, it uses the system-wide DLQ: after a delivery failure, WCF moves the message from the client-side queue to the system-wide DLQ.

Note:

The system-wide DLQ is a transactional queue, so you must have the ExactlyOnce binding property set to its default value of true and the Durable property set to its default value of true.

When DeadLetterQueue is set to DeadLetterQueue.Custom, the application can take advantage of a dedicated DLQ. DeadLetterQueue.Custom requires the use of MSMQ 4.0, and WCF verifies that at the call time. In addition, WCF requires that the application specify the custom DLQ address in the CustomDeadLetterQueue property of the binding. The default value of CustomDeadLetterQueue is null, but when DeadLetterQueue.Custom is employed, CustomDeadLetterQueue cannot be null:

<netMsmqBinding>
   <binding name = "CustomDLQ"
       deadLetterQueue = "Custom"
       customDeadLetterQueue = "net.msmq://localhost/private/MyCustomDLQ">
   </binding>
</netMsmqBinding>

Conversely, when the DeadLetterQueue property is set to any other value besides DeadLetterQueue.Custom, then CustomDeadLetterQueue must be null.

It is important to realize that the custom DLQ is just another MSMQ queue. It is up to the client-side developer to also deploy a DLQ service that processes its messages. All WCF does on MSMQ 4.0 is automate the act of moving the message to the DLQ once a failure is detected.

3.1. Custom DLQ verification

If a custom DLQ is required, as with any other queue, the client should verify at runtime (before issuing queued calls) that the custom DLQ exists and, if necessary, create it. Following the pattern presented previously, you can automate and encapsulate this with the ServiceEndpoint extension method VerifyQueue() of QueuedServiceHelper, shown in Example 1.

Example 1. Verifying a custom DLQ

public static class QueuedServiceHelper
{
   public static void VerifyQueue(this ServiceEndpoint endpoint)
   {
      if(endpoint.Binding is NetMsmqBinding)
      {
         string queue = GetQueueFromUri(endpoint.Address.Uri);
         if(MessageQueue.Exists(queue) == false)
         {
            MessageQueue.Create(queue,true);
         }
         NetMsmqBinding binding = endpoint.Binding as NetMsmqBinding;
         if(binding.DeadLetterQueue == DeadLetterQueue.Custom)
         {
            Debug.Assert(binding.CustomDeadLetterQueue != null);
            string DLQ = GetQueueFromUri(binding.CustomDeadLetterQueue);
            if(MessageQueue.Exists(DLQ) == false)
            {
               MessageQueue.Create(DLQ,true);
            }
         }
      }
   }
   //More members
}